[manpage_begin tth 3tcl 0.1]
[comment {$Id: tth.man 20 2007-08-17 00:40:38Z khomoutov $}]

[copyright {2007 Konstantin Khomoutov <flatworm@users.sourceforge.net>}]
[moddesc {Tiger Hash, Tiger Tree Hash, THEX}]

[require Tcl ?8.1?]
[require tth ?0.1?]
[comment {
[usage [cmd tth::tiger] [opt options] [arg bitstring]]
[usage [cmd tth::tth] [cmd digest] [opt options] [cmd -string] [arg bitstring]]
[usage [cmd tth::tth] [cmd digest] [opt options] [cmd -chan] [arg channel]]
[usage [cmd set] tthContext \[[cmd tth::tth] [cmd init]\]]
[usage [cmd tth::tth] [cmd update] [arg tthContext] [arg bitstring]]
[usage [cmd tth::tth] [cmd digest] [opt -context] [arg tthContext]]
}]

[description]

This module implements several Tcl commands for calculating
Tiger Hash (v1) and Tiger Tree Hash (TTH) (with the support for
Tree Hash Exchange (THEX) output format). The module is written
in C and provides for various strategies of processing the input data
as well as several formats for resulting digests.

[para]

In general, this module provides two commands: [cmd tiger] and
[cmd tth] which are contained in the namespace [namespace ::tth].
(Both commands can be imported from their namespace.)

[para]

The [cmd tiger] command is able to calculate Tiger Hash (v1) on
a bitstring. It's inherently limited in its operation since it
requires all the input data to be passed at once so it can hardly
be used on huge files, for instance.

[para]

[cmd tiger] supports three standard bit lengths of the resulting
digest: 192 (the default), 160 and 128 bits. Resulting hash
can be output as either a "raw" bitstring or a string of
hexadecimal digits representing bytes of the bitstring.

[para]

The [cmd tth] command is able to calculate Tiger Tree Hash (TTH)
using the so-called Merkele Hash Tree approach and can be
used on huge amounts of data. It supports processing of a single
bitstring (like the [cmd tiger] command), a stream of data from
an open Tcl channel, or it can operate using "hashing context":
at first the context is created, then it's updated using
arbitrary number of data chunks and finally a resulting digest is
returned.

[para]

[cmd tth] supports the same bit lengths and output formats as
[cmd tiger] plus one additional format called Tree Hash Exchange
(or THEX for short). This is essentially a Base32 representation
of the resulting digest with any trailing "=" characters resulting
from the base32 "padding" removed
(See also [sectref {THEX FORMAT}]).

[para]

Tiger Tree hashes in THEX format are used in several peer-to-peer
file sharing protocols (namely, DirectConnect (also known as NMDC)
and ADC) to ensure integrity of the transferred files and also
for identifying several copies of the same file possessed by
different peers to provide for so-called "multiple segment
downloading". Implementation of TTH with THEX was the immediate goal
for creation of this module.

[subsection {Notes on output format}]

The Tiger Hash specification defines the result of applying this
algorithm as three 64-bit (wide) words, and no
statements are made to postulate the "reference interpretation"
of these integers in the sense of converting them to the resulting
bitstring.

[para]

This module uses the format that is known to be the accepted
practice and is also documented in the official test vectors for
the Tiger Hash (v1) (provided by the NESSIE project).
This format defines the resulting bitstring as if the three
64-bit words of the calculated digest are turned big-endian
and then concatenated left-to-right from the first to the third.
"Big-endian" in this case means "most significant [emph octet]
first", not [emph bit.]

[para]

So any command implemented by this package returns an array of
bytes produced as described above or some of its printable
representations.

This is made intentionally to make the produced results and
its interpretation on a script level independent of the
host platform "endianness".

This means that if you need to get your hands on the original
three 64-bit words of the resulting digest you have to
apply something like this to the result of the provided
commands (assuming they are told to output the "raw" bitstring):

[example {binary scan [tth::tiger -raw $input] W* words}]

[subsection {Notes on input format}]

While Tiger Hash is known to operate on bits the Tcl's least
"addressable" unit is one character (or one octet in the case of
binary strings which are of most use when representing data
to be hashed). This is OK for files and other data represented
in octets but if you need to operate on quantities of bits not
evenly divisable by 8 you have to "cook" binary strings
from them.

[para]

TODO: provide more info on constructing binary strings from
Tcl's "lists of bits" and padding for them.

[section {COMMON OPTIONS}]

All commands in this package accept options.  Options are parsed
from left to right and for the sake of simplcity no conflicts
between them, if any, are reported -- later options simply
override earlier. Some options select default parameters and are
redundant.

[para]

Several options controlling the format of the resulting digest
are common to both commands; they are:
[list_begin opt]
	[opt_def -192]
	Requests the length of the resultig hash to be 192 bits (or
	24 octets).  This is the so-called full hash length, Tiger
	Hash is always computed as a 192-bit sequence. This length
	is recommended for usage by the authors of the Tiger Hash
	algorythm.

	[opt_def -160]
	Requests the length of the resulting hash to be
	160 bits (or 20 octets). These are just the first 160 bits
	of the full-length 192 bit hash.

	[opt_def -128]
	Requests the length of the requlting hash to be
	128 bits (or 16 octets). These are just the first 128 bits
	of the full-length 192 bit hash.

	[opt_def -hex]
	Requests the output format to be a string of
	lowercase hexadecimal digit characters representing
	octets of the resulting hash (first octet goes as the
	two leftmost characters in the string and so on).
	This format is analogous to that used by the [cmd {md5 -hex}]
	command from tcllib, for example.

	[opt_def -raw]
	Requests the output format to be a "raw" bitstring, i.e. the
	hash as it was computed (but see [sectref {Notes on output
	format}] above).
[list_end]

[section COMMANDS]

[subsection [cmd tth::tiger]]

This command generates the digest of the supplied bitstring
using the Tiger Hash (v1) algorithm.
The format of this command is:
[list_begin definitions]
	[call tth::tiger [opt options] [arg bitstring]]
[list_end]

[comment {TODO: provide proper invocation format}]

[para]

The valid options are:
[list_begin opt]
	[opt_def -192]
	[emph (Default)]
	Requests the length of the resultig hash to be 192 bits.

	[opt_def -160]
	Requests the length of the resultig hash to be 160 bits.

	[opt_def -128]
	Requests the length of the resultig hash to be 128 bits.

	[opt_def -hex]
	[emph (Default)]
	Requests the format of the result to be a string of
	hexadecimal digit characters representing octets of
	the resulting hash.

	[opt_def -raw]
	Requests the format of the result to be a "raw" bistring
	as was computed by the algorithm.
[list_end]

[para]

See [sectref {COMMON OPTIONS}] for more detailed description
of these options.

[para]

This command returns the calculated digest using selected
output format. The default format is hexadecimal representation
of the full 192-bit result.

[subsection [cmd tth::tth]]

This command calculates digest of the input data using the Tiger
Three Hash algorythm processing the input data by 1024-byte blocks.
It has several modes of operation:
[list_begin bullet]
	[bullet]
	Processing of the supplied bitstring (in the same way as
	[cmd tth::tiger] works).

	[bullet]
	Processing all the data from the given Tcl channel from the
	current stream position (if any) to its EOF.

	[bullet]
	"Gradual update" mode: the digest "context" is created first
	and the handle to it is passed to the user code. Then the
	user can sequentially update the digest with arbitrary
	number of data chunks of arbitrary size using that handle.
	Finally, the digest is returned when requested and the
	context is freed.
[list_end]

[para]

Contrary to [cmd tth::tiger], this command has several subcommands:
[list_begin definitions]
	[call {tth::tth init}]
	Allocates new digest context and returns a handle to it.
	This subcommand accepts no arguments.

	[call {tth::tth update} [arg tthContext] [arg bitstring]]
	Updates the digest associated with the given [arg tthContext]
	using provided [arg bitstring].
	[arg tthContext] must be a value
	returned by a previous call to [cmd {tth::tth init}].

	[call {tth::tth digest} [opt options] [opt -context] [arg tthContext]]
	Returns the digest associated with the given [arg tthContext].
	[arg tthContext] must be a value
	returned by a previous call to [cmd {tth::tth init}].
	That context is automatically freed after this command is
	finished; any futher attempts to use it will fail.

	[call {tth::tth digest} [opt options] [option -string] [arg bitstring]]
	Calculates Tiger Tree Hash on a given bitstring and returns
	it as a result.

	[call {tth::tth digest} [opt options] [option -chan] [arg tclChannel]]
	Reads the data from a given open Tcl channel from its current position
	until end-of-file condition and calculates Tiger Tree Hash on this
	data. The resulting hash is returned.
	See [sectref {USAGE CONSIDERATIONS}] for additional details.
[list_end]

[para]

Each form of the [cmd {tth::tth digest}] subcommand accepts several
options controlling the output format of the resulting digest:
[list_begin opt]
	[opt_def -192]
	[emph (Default)]
	Requests the length of the resultig hash to be 192 bits.

	[opt_def -160]
	Requests the length of the resultig hash to be 160 bits.

	[opt_def -128]
	Requests the length of the resultig hash to be 128 bits.

	[opt_def -thex]
	[emph (Default)]
	Requests the format of the result to be a string of ASCII
	characters representing the resulting digest in the
	Tree Hash Exchange format.
	See [sectref {THEX FORMAT}] for details.

	[opt_def -hex]
	Requests the format of the result to be a string of
	hexadecimal digit characters representing octets of
	the resulting hash.

	[opt_def -raw]
	Requests the format of the result to be a "raw" bistring
	as was computed by the algorithm.
[list_end]

[section {THEX FORMAT}]

Tree Hash Exchange (THEX) format is described in
[sectref REFERENCES [lb]2[rb]].

[para]

The specification presented in this document is somewhat
complicated since it tries to cover not only the format of the
hash itself but also a higher-level framework for its
encapsulation.
What is meant under the term "THEX format" as implemented by
this package is a rather more simple thing: just a form of
"ASCII-armouring" of the Tiger Hash as specified by
the aforementioned document.

[para]

This kind of armouring is just a Base32 encoding applied to the
bitstring of the resulting hash. This is
[strong not] [emph strictly] a
Base32 encoding though: Base32 encoding operates on 40-bit
groups of the source bitstring and uses "padding" (adding of the
"=" character at the end of the output string) to specify that
the length of the source string isn't an integer multiply of 40
bits (see [sectref REFERENCES [lb]3[rb]]).
THEX format doesn't use such padding, so the transformation of
a TTH bitstring to the THEX format is roughly equivalent to this
code (using [package base32] package from [package tcllib]):
[example {
package require base32
package require tth
set hash [tth::tth digest -raw -string $source]
set thex [string trimright = [base32::encode $hash]]
}]

To illustrate, here's what generates [cmd {tth::tth digest}] with
the [option -thex] option and [cmd base32::encode]
from the [package base32] package applied to the same bitstrings:

[example {
package require base32
package require tth

% ::tth::tth digest -thex -192 -string ""
LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ
% ::base32::encode [::tth::tth digest -raw -192 -string ""]
LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ=

% ::tth::tth digest -thex -160 -string ""
LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHH
% ::base32::encode [::tth::tth digest -raw -160 -string ""]
LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHH

% ::tth::tth digest -thex -128 -string ""
LWPNACQDBZRYXW3VHJVCJ64QBY
% ::base32::encode [::tth::tth digest -raw -128 -string ""]
LWPNACQDBZRYXW3VHJVCJ64QBY======
}]

As can be seen, only bitstrings of 160 bits produce equal
results when fed to both methods since 160 is an intergal
multiply of 40.

[para]

Another thing which is worth mentioning is that the THEX
format document doesn't specify permitted lengths of hash
bitstrings to be used. So the implementation of this package
assumes that bitstrings of 160 and 128 bits are allowed to be
formatted using THEX.

[para]

Note also that both DirectConnect (NMDC) and ADC use
(canonical) 192-bit hashes and transmit them using THEX format.

[section {USAGE CONSIDERATIONS}]

TODO: stress that data almost always should come from binary
channels, if they're used.

[section EXAMPLES]

[list_begin bullet]
	[bullet]
	Calculate TTH on a file:
	[example {
set fd [open somefile.dat]
fconfigure $fd -translation binary
set hash [tth::tth digest -chan $fd]
close $fd
	}]

	[bullet]
	Calculate TTH of a data being read from an asynchronous
	network socket:
	[example {
set sock [socket $host $port]

variable $sock
upvar 0 $sock state

set state(context) [tth::tth init]

fconfigure $sock -translation binary -blocking no
fileevent readable $sock [list ::GetData $sock]

proc ::GetData sock {
	variable $sock
	upvar 0 $sock state

	if {![eof $sock]} {
		# Update digest with the new chunk of data:
		tth::tth update $state(context) [read $sock]
	} else {
		# Get final digest and close the socket
		# (TTH context will be freed automatically):
		set state(digest) [tth::tth digest $state(context)]
		close $sock
	}
}

vwait ${sock}(digest)

puts "Digest is: $state(digest)"

unset state
	}]

[list_end]

[section REFERENCES]

[list_begin enum]
	[enum]
	Tiger Hash home page:
	http://www.cs.technion.ac.il/~biham/Reports/Tiger/

	[enum]
	Tree Hash Exchange (THEX) memo:
	http://www.open-content.net/specs/draft-jchapweske-thex-02.html

	[enum]
	"The Base16, Base32, and Base64 Data Encodings" (RFC):
	http://tools.ietf.org/html/rfc4648
[list_end]

[section BUGS]

The current version of this package also supports one
undocumented option to the [cmd tth::tth] command:
[option -mmap], support for which is enabled and compiled in on
platforms which have support for [fun mmap()] and
[fun munmap()] syscalls (namely, systems conforming to POSIX).
This option is semantically analogous to the [option -chan] and
[option -string] options and accepts the name of a file to
be processed.
The file is read sequentially using [fun mmap()]/[fun munmap()]
calls. This approach has surprisingly proven to be much slower
than the current top-speed method -- [option -chan].
So the future of this option is uncertain.

[section AUTHORS]

This package is created by
Konstantin Khomoutov <flatworm@users.sourceforge.net>

[section CREDITS]

[list_begin bullet]
	[bullet]
	The implementation of the Tiger Hash algorithm used in this
	package is done by its authors: Eli Biham and Ross Anderson.

	[bullet]
	The implementation of the Tiger Tree Hash algorythm used in
	this package is done by Gordon Mohr in its "tigertree"
	project.
[list_end]

[see_also tcllib base32 base64 md5 sha1]
[comment {vim:sytnax=tcl:tw=72:noet}]
[manpage_end]

